The project will train a model that will predict the win rate of a League of Legends game.
League of Legends, also known as League, is a multiplayer online battle arena video game.
In a typical game, two teams of five players each face off against each other on a battlefield. Each player controls a champion with unique abilities and characteristics, and the goal is to destroy the enemy team’s base. The champions are divided into different classes, such as marksmen, mages, and tanks, and each player must carefully select their champion and coordinate with their teammates to succeed. The game is known for its complex strategy and intense team play.
players can earn experience points and gold by defeating enemy champions and minions. These resources can be used to level up the player’s champion and purchase items that grant additional abilities and bonuses. As the game progresses, champions become stronger and gain access to more powerful abilities. The game also features a variety of different maps, each with its own unique layout and challenges. The outcome of a match is determined by a combination of skill, strategy, and teamwork.
Destroying the base (aka the Nexus).
each team’s base is heavily guarded by 11 turrets from three directions. These turrets deal a massive amount of damage to enemy champions and minions when they are in range. In order to deal damage to the enemy base, a team must first destroy at least five of these turrets. This can be a challenging task, as the turrets are powerful and well-protected. As a result, players must carefully plan their attacks and coordinate with their teammates in order to succeed.
In addition to the turrets, each team’s base also generates minions that automatically march towards the enemy base. These minions deal damage to enemy minions and buildings along the way. The enemy team will also attempt to protect their turrets and base by attacking your champions. If your champion is killed, they will respawn at your base after a certain amount of time. Alternatively, you can force your opponent to retreat by dealing a sufficient amount of damage to them. In this way, you can protect your own turrets and minions and pave the way for an attack on the enemy base.
Based on the information provided, a successful strategy in League of Legends may involve the following steps:
Level up and earn gold by defeating minions or neutral entities on the battlefield.
Use the gold to purchase items that increase your champion’s damage or defense.
Attack and defeat enemy champions to earn even more gold and increase your team’s chances of victory.
Focus on destroying enemy turrets in order to clear the way for an attack on the enemy base.
Repeat steps 1-4, and prioritize steps 1-3 in a way that best supports your team’s overall strategy.
There are several common beliefs about what contributes most to a team’s win rate in League of Legends. Testing these beliefs using data analysis and modeling can help determine whether or not they are accurate. For example:
Maximizing champion kills may be seen as a key factor in winning a game, as killing a champion grants a large amount of gold.
The first kill of a game may be particularly important, as it grants a bonus of gold and can help a team gain an early advantage.
Vision score, or the ability to see and track enemy movements on the battlefield, may be critical to winning a game, as it can prevent teams from being ambushed and caught off guard.
Stealing objectives from the enemy team, such as killing their minions or destroying their turrets, may demoralize the opponent and increase the likelihood of mistakes.
The steps outlined above are a simplified summary of the game mechanics and strategy in League of Legends. In reality, there are many other factors that can impact a team’s win rate, such as stealing neutral entities, placing vision wards, or sacrificing for teammates by taking damage on their behalf. A predictive model can be useful for identifying the most important factors that contribute to a team’s success and prioritizing them in the decision-making process. In addition to making predictions, the model can also help identify which components have the largest impact on win rate and should be prioritized in strategy development.
The dataset was downloaded from Kaggle, which records ranking games that created in one day from the Korean server.
Using a large dataset with millions of observations can provide valuable insights and improve the accuracy of predictive models. However, it can also be challenging to work with such a large dataset, especially if you are using a resource-intensive model like random forest or support vector machines with regularization. In this case, using a smaller subset of the data can help reduce the time and memory requirements of the analysis, while still providing useful insights. By carefully selecting a representative sample of the data, you can still obtain valuable results without overwhelming your computer’s resources. However, it’s important to keep in mind that using a smaller sample may result in less accurate predictions, so it’s a trade-off that should be carefully considered.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(corrplot))
suppressPackageStartupMessages(library(discrim))
suppressPackageStartupMessages(library(poissonreg))
suppressPackageStartupMessages(library(corrr))
suppressPackageStartupMessages(library(klaR))
suppressPackageStartupMessages(library(vroom))
suppressPackageStartupMessages(library(MASS))
suppressPackageStartupMessages(library(janitor))
suppressPackageStartupMessages(library(ggcorrplot))
suppressPackageStartupMessages(library(vip))
suppressPackageStartupMessages(library(ranger))
suppressPackageStartupMessages(library(kernlab))
suppressPackageStartupMessages(library(splitstackshape))
suppressPackageStartupMessages(library(xgboost))
tidymodels_prefer()
As the author of the dataset suggests, the original encoding is
cp949 (One type of Korean character encoding). So it is
wise to convert it to UTF-8 first to avoid encoding issue
Here I use the iconv command line tool to change the
encoding
iconv -f cp949 -t utf-8 league_data.csv > league_data_utf8.csv
league_all_df <- clean_names(vroom("dataset/league_data_utf8.csv"))
## Rows: 2589340 Columns: 58
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): summonerName, win, teamPosition, visionScore, puuid, summonerId, ...
## dbl (40): no, gameNo, playerNo, participantId, teamId, kills, deaths, assis...
## lgl (6): gameEndedInEarlySurrender, gameEndedInSurrender, teamEarlySurrend...
## dttm (2): CreationTime, KoreanTime
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Drop variables that are not needed
league_all_df <- league_all_df %>%
select(
-c(summoner_name,
puuid,
summoner_id,
creation_time,
participant_id,
))
Remove rows that contains values that do not match the column types
In the case of games that are remade because one or more players are disconnected, it may be best to exclude these games from the analysis. Remake voting typically occurs within the first 1-2 minutes of a game, so you can use this as a cutoff point for determining which games to include in the analysis. However, it’s also worth considering that games can continue for a period of time after the remake voting has passed, so using a slightly longer timeframe of 5 minutes as a boundary may provide more accurate results. This will ensure that any games that are remade within the first 5 minutes are excluded, while still allowing games that continue after the remake to be included in the analysis.
# for (col in colnames(league_all_df)) {
# if (is.character(league_all_df[[col]])) {
# print(unique(league_all_df[,col]))
# }
# }
# league_all_df <-
league_all_df <- league_all_df %>%
filter(time_played > 360) %>%
filter(win %in% c("True", "False")) %>%
filter(team_position %in% c("TOP", "JUNGLE", "MIDDLE", "BOTTOM", "UTILITY")) %>%
filter(first_tower_kill %in% c("True", "False"))
# Fixing vroom type import issue
league_all_df$win <- as.logical(league_all_df$win)
league_all_df$first_tower_kill <- as.logical(league_all_df$first_tower_kill)
league_all_df$vision_score <- as.numeric(league_all_df$vision_score)
league_all_df$champ_level <- as.numeric(league_all_df$champ_level)
league_all_df$dragon_kills <- as.numeric(league_all_df$dragon_kills)
Remove NAs - listing column containing NA - remove rows containing NA
# Get columns with na values
na_columns <- names(which(colSums(is.na(league_all_df)) > 0))
print(na_columns)
## [1] "no"
# Loop through the columns and filter out not na rows
for (col in na_columns) {
league_all_df <- league_all_df %>%
filter(!is.na({{col}}))
}
Take 10000 observations but stratified on win
league_all_df <- stratified(league_all_df, "win", 10000)
league_all_df <- league_all_df %>%
mutate(team = case_when(
team_id == "100" ~ "blue",
team_id == "200" ~ "red",
))
Convert appropriate predictors to factor
str(league_all_df)
## Classes 'data.table' and 'data.frame': 20000 obs. of 54 variables:
## $ no : num 371 110872 204044 258248 248430 ...
## $ game_no : num 6e+09 6e+09 6e+09 6e+09 6e+09 ...
## $ player_no : num 6 6 0 8 3 9 9 3 0 2 ...
## $ korean_time : POSIXct, format: "2022-07-02 09:51:36" "2022-07-02 19:42:06" ...
## $ team_id : num 200 200 100 200 100 200 200 100 100 100 ...
## $ game_ended_in_early_surrender : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ game_ended_in_surrender : logi TRUE TRUE TRUE FALSE FALSE FALSE ...
## $ team_early_surrendered : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ win : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ team_position : chr "JUNGLE" "JUNGLE" "TOP" "BOTTOM" ...
## $ kills : num 2 5 3 2 23 1 3 9 5 5 ...
## $ deaths : num 4 8 0 4 5 8 9 10 5 4 ...
## $ assists : num 4 3 2 4 8 21 9 7 1 6 ...
## $ objectives_stolen : num 0 0 0 0 0 0 0 0 0 0 ...
## $ vision_score : num 21 18 16 6 42 80 39 50 2 22 ...
## $ baron_kills : num 0 1 0 0 0 0 0 0 0 0 ...
## $ bounty_level : num 0 0 3 0 0 0 0 0 0 0 ...
## $ champ_level : num 13 14 12 10 17 16 12 15 11 12 ...
## $ champion_name : chr "Ekko" "Shyvana" "Jayce" "Zeri" ...
## $ damage_dealt_to_buildings : num 0 535 1845 55 9341 ...
## $ damage_dealt_to_objectives : num 20234 21476 1845 2398 36837 ...
## $ detector_wards_placed : num 2 3 1 1 2 3 8 2 0 2 ...
## $ double_kills : num 0 1 0 1 4 0 0 1 0 0 ...
## $ dragon_kills : num 1 1 0 0 1 1 0 0 0 0 ...
## $ first_blood_assist : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ first_blood_kill : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ first_tower_assist : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ first_tower_kill : logi FALSE FALSE FALSE FALSE TRUE FALSE ...
## $ gold_earned : num 9183 10997 6918 7231 21702 ...
## $ inhibitor_kills : num 0 0 0 0 1 0 0 0 0 0 ...
## $ inhibitor_takedowns : num 0 0 0 0 1 0 0 0 0 0 ...
## $ inhibitors_lost : num 0 3 0 2 1 3 3 1 0 0 ...
## $ killing_sprees : num 0 2 1 1 3 0 1 2 2 1 ...
## $ largest_killing_spree : num 0 2 3 2 10 0 2 3 2 4 ...
## $ largest_multi_kill : num 1 2 1 2 2 1 1 3 1 1 ...
## $ longest_time_spent_living : num 572 688 0 457 621 786 323 448 325 559 ...
## $ neutral_minions_killed : num 152 111 0 1 39 4 0 4 4 4 ...
## $ objectives_stolen_assists : num 0 0 0 0 0 0 0 0 0 0 ...
## $ penta_kills : num 0 0 0 0 0 0 0 0 0 0 ...
## $ quadra_kills : num 0 0 0 0 0 0 0 0 0 0 ...
## $ time_c_cing_others : num 15 6 8 7 7 61 49 23 5 4 ...
## $ time_played : num 1563 1965 1036 1292 2153 ...
## $ total_damage_dealt : num 151959 149042 78922 58277 314578 ...
## $ total_damage_dealt_to_champions: num 8564 19938 12433 11629 47639 ...
## $ total_damage_taken : num 26808 48266 7307 12195 29260 ...
## $ total_heal : num 11859 7903 0 1550 5439 ...
## $ total_heals_on_teammates : num 0 0 0 258 547 0 0 0 225 0 ...
## $ total_minions_killed : num 18 44 138 118 247 58 34 40 123 129 ...
## $ total_time_cc_dealt : num 444 668 90 80 147 699 291 91 138 102 ...
## $ total_time_spent_dead : num 86 285 0 92 195 309 235 277 101 102 ...
## $ total_units_healed : num 1 1 0 2 3 1 1 1 3 1 ...
## $ triple_kills : num 0 0 0 0 0 0 0 1 0 0 ...
## $ unreal_kills : num 0 0 0 0 0 0 0 0 0 0 ...
## $ team : chr "red" "red" "blue" "red" ...
## - attr(*, ".internal.selfref")=<externalptr>
league_all_df_factored <- league_all_df %>%
mutate(
across(
c(
win,
team_position,
first_blood_kill,
first_blood_assist,
first_tower_kill,
first_tower_assist,
team,
champion_name),
as.factor)
)
# Releveling
league_all_df_factored <- league_all_df_factored %>%
mutate(across(
c(
win,
first_blood_kill,
first_blood_assist,
first_tower_kill,
first_tower_assist,
),
~fct_relevel(., c("TRUE", "FALSE"))))
Now I will start the exploratory data analysis. First, I will have the correlation matrix prepared by dummying several factorized predictors
print(colnames(league_all_df_factored))
## [1] "no" "game_no"
## [3] "player_no" "korean_time"
## [5] "team_id" "game_ended_in_early_surrender"
## [7] "game_ended_in_surrender" "team_early_surrendered"
## [9] "win" "team_position"
## [11] "kills" "deaths"
## [13] "assists" "objectives_stolen"
## [15] "vision_score" "baron_kills"
## [17] "bounty_level" "champ_level"
## [19] "champion_name" "damage_dealt_to_buildings"
## [21] "damage_dealt_to_objectives" "detector_wards_placed"
## [23] "double_kills" "dragon_kills"
## [25] "first_blood_assist" "first_blood_kill"
## [27] "first_tower_assist" "first_tower_kill"
## [29] "gold_earned" "inhibitor_kills"
## [31] "inhibitor_takedowns" "inhibitors_lost"
## [33] "killing_sprees" "largest_killing_spree"
## [35] "largest_multi_kill" "longest_time_spent_living"
## [37] "neutral_minions_killed" "objectives_stolen_assists"
## [39] "penta_kills" "quadra_kills"
## [41] "time_c_cing_others" "time_played"
## [43] "total_damage_dealt" "total_damage_dealt_to_champions"
## [45] "total_damage_taken" "total_heal"
## [47] "total_heals_on_teammates" "total_minions_killed"
## [49] "total_time_cc_dealt" "total_time_spent_dead"
## [51] "total_units_healed" "triple_kills"
## [53] "unreal_kills" "team"
# Plot correlation matrix
league_df_eda <- select(league_all_df_factored, -c(no, game_no, player_no, korean_time, team_id, champion_name, game_ended_in_early_surrender, game_ended_in_surrender, team_early_surrendered))
correlations <- model.matrix(~., data = league_df_eda) %>%
cor(use='complete.obs')
## Warning in cor(., use = "complete.obs"): the standard deviation is zero
correlations %>%
ggcorrplot(show.diag = T, type="full", lab=TRUE, lab_size = 2, tl.srt = 90)
The correlation matrix can be useful for identifying predictors that are correlated with losing the game. However, it’s important to note that some of these predictors may be highly collinear, meaning that they are strongly correlated with each other. This can reduce the interpretability of the model and make it difficult to determine the specific factors that are contributing to a team’s success or failure. In order to avoid collinearity and improve the interpretability of the model, it may be necessary to select a subset of predictors that capture the essence of the data while removing redundant or highly correlated predictors.
For example, the following predictors may be disregarded because they are highly correlated with each other or with other predictors of interest:
double_kill, triple_kill, quadra_kill, penta_kill, killing_spree, largest_killing_spree, and unreal kill: these predictors are highly correlated with each other, and they are also highly correlated with the kill, gold, and total_damage_dealt predictors, which are our predictors of interest.
total_damage_dealt: this predictor is simply the sum of total_damage_dealt_to_champion and damage to non-champion entities (which can be inferred from other predictors such as neutral_minions_killed and dragon_kills). Because it is collinear with total_damage_dealt_to_champion, it may be redundant and can be removed.
total_time_spent_dead: this predictor is the sum of death time at each level, but it is highly correlated with the number of deaths and may not provide additional useful information.
gold_earned: this predictor is calculated as the sum of kills, assists, and building destroyed, but I am interested in these components individually rather than as a combined value.
The final lists of variables becomes
| Variable | Explanation |
|---|---|
| kills | The number of opponent champion player kills |
| assists | The number of opponent champion kill player assisted |
| deaths | The number of times player has been killed |
| champ_level | The level of champion when the game ends |
| bounty_level | The level of bounty of a player, higher bounty gives opponent more gold if successfully killed the player |
| objectives_stolen | The number of dragon, rift herald and baron stolen |
| objectives_stolen_assist | The number of dragon, rift herald and baron stolen assist |
| vision_score | The amount of vision score player get from placing and countering opponent ward |
| damage_dealt_to_buildings | Damage dealt to to turrets and inhibitors |
| first_blood_assist | If the player assist teammate in getting the first blood |
| first_blood_kill | If the player get the first blood kill |
| fist_tower_assist | If the player assist teammate in getting the first tower |
| fist_tower_kill | If the player getting the first tower |
| inhibitor_takedowns | The number of inhibitor destruction player participated in |
| inhibitor_lost | the number of inhibitor lost as a team |
| longest_time_spend_living | longest time player lived before between consecutive death |
| neutral_minion_killed | jungle monster killed |
| time_c_cing_others | total time of crowd control cast on opponent champions |
| total_damage_dealt_to_champions | total amount of damage dealt to opponent champions |
| total_damage_taken | total damage receives from opponent champion |
| total_heal | total healing received or self-casted |
| total_heals_on_teammates | total healing cast on teammates |
| total_minion_killed | total minion killed |
| dragon_kills | The number of dragon player kills |
| baron_kills | The number of baron player kills |
| team | either blue or red team |
In addition to selecting a subset of predictors, it may also be useful to include interactions between certain predictors in the model. Interactions capture the enhancement effect that one predictor has on another, and can provide valuable information about how different factors influence each other. For example, the following interactions may be relevant to the analysis:
kill:champ_level: the more kills a player gets, the more likely they are to gain experience and reach higher levels, which can make it easier to kill more champions. This interaction captures the feedback loop between kills and champion level.
dragon_kills:champ_level: the team that kills a dragon gains bonuses to their stats, which can make it easier to kill more dragons and gain further advantages. This interaction captures the effect of dragon kills on champion level.
inhibitor_lost:death: the loss of an inhibitor can make it easier for the enemy team to attack and kill your champions. This interaction captures the relationship between losing an inhibitor and the number of deaths on a team.
Now I will take a look at some distributions in details
KDA are the essential stats of the game and are often considered as the most straightforward indicators of a player’s performance and skill
The kill number distribution plot shows that, in general, more kills are associated with a higher likelihood of winning the game. This is especially true for games with more than 7 kills. However, it’s interesting to note that a significant number of players in the dataset have 0 kills and yet still manage to win the game. This suggests that there may be other factors at play, such as a more defensive playing style or a willingness to sacrifice personal kills in order to support the team.
The assist number distribution plot appears to have a similar shape, with more assists being associated with a higher likelihood of winning. This suggests that assists may play a similar role to kills in contributing to a team’s success. Overall, both kills and assists appear to be important factors in determining the outcome of a game.
ggplot(league_all_df_factored, aes(kills)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
ggplot(league_all_df_factored, aes(assists)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
The death number distribution follows the opposite pattern of the kill number distribution. The plot shows a consistent advantage for teams with fewer deaths, with the highest win rate occurring for teams with less than 3 deaths. This suggests that minimizing deaths is an important factor in achieving success in League of Legends. However, it’s also worth noting that games with low death counts make up a significant portion of the overall distribution, so achieving a low death count may not be as difficult as it appears from the plot.
ggplot(league_all_df_factored, aes(deaths)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
These stats are strongly related to the continuous improvement of a champion
The champion level distribution plot suggests that players with a champion level below 14 are more likely to lose the game. However, after reaching level 14, the win ratio appears to remain consistent. This turning point is likely due to game design, as all players in a game typically reach level 14 and higher-level champions no longer have a sharp advantage over lower-level champions. This suggests that reaching level 14 is an important milestone in the game, and players should prioritize leveling up their champions in order to gain this advantage.
ggplot(league_all_df_factored, aes(champ_level)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
The win rate distribution plot for number of minions killed shows that the win rate is consistent around 50% for numbers up to 250. The spike in the number of games at around 40 is likely due to players who take on a utility role, in which they do not take all the minions for themselves but instead share the gold with their teammates. It’s interesting to note that the win rate appears to decrease for numbers greater than 250. One possible explanation for this is that players who kill a large number of minions may restrict the scaling of their teammates’ champions, leading to a lower win rate. This suggests that there may be a trade-off between personal gold acquisition and team success in League of Legends.
ggplot(league_all_df_factored, aes(total_minions_killed)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
The death time distribution plot suggests that players are more likely to lose if they have shorter intervals between deaths. This may be because players who die frequently are unable to contribute as much to the team, limiting the team’s overall effectiveness. It’s also worth noting that the plot shows a peak at around 200 seconds, which could be due to players who are able to avoid dying for extended periods of time but still succumb to frequent deaths. Overall, this plot suggests that minimizing the time between deaths is an important factor in achieving success in League of Legends.
ggplot(league_all_df_factored, aes(longest_time_spent_living)) +
geom_histogram(aes(fill = win), bins = 50) +
scale_fill_manual(values = c("blue", "red"))
The inhibitor loss distribution plot suggests that players are more likely to lose if their team loses one or more inhibitors. This is consistent with the high correlation between win rate and inhibitors_lost, as losing inhibitors can make it easier for the enemy team to attack and defeat your team. The plot shows a steep drop in win rate for teams that lose one or more inhibitors, indicating that this is a critical factor in determining the outcome of a game. Overall, this plot supports the importance of protecting inhibitors in order to maximize your team’s chances of success.
ggplot(league_all_df_factored, aes(inhibitors_lost)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
The total damage dealt distribution density plot shows the amount of damage that is typically required to win a game if neither team surrenders. This amount is equivalent to the combined health of all five turrets, one inhibitor, and one base. The plot shows a steep increase in the required damage as the game progresses, indicating that teams must deal significant amounts of damage in order to win. This plot provides insight into the overall strategy and pacing of a game, and can help teams plan their attack and defense in order to maximize their chances of victory.
mean(league_all_df_factored$damage_dealt_to_buildings)
## [1] 2650.275
ggplot(league_all_df_factored, aes(damage_dealt_to_buildings, stat = 'bin')) +
geom_histogram(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(league_all_df_factored, aes(damage_dealt_to_buildings)) +
geom_freqpoly(aes(fill = win))
## Warning in geom_freqpoly(aes(fill = win)): Ignoring unknown aesthetics: fill
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The dragon kills distribution plot suggests that the number of dragon kills does not have a significant impact on the win rate unless more than 2 dragons are killed. This may be because killing a single dragon provides only a small advantage, but killing multiple dragons can provide a significant boost to a team’s stats and make it easier to win the game. The plot also shows that most teams do not kill any barons, but that getting a baron greatly increases the likelihood of winning. This indicates that barons are powerful objectives that can provide a significant advantage to the team that controls them. Overall, this plot provides insight into the importance of objectives in League of Legends and can help teams plan their strategy accordingly.
ggplot(league_all_df_factored, aes(dragon_kills)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
ggplot(league_all_df_factored, aes(baron_kills)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
Even though each game often has a utility champions, other roles can still perform utility action like vision or healing.
he vision score distribution plot suggests that having a low vision score is strongly correlated with losing the game. However, the plot also shows that having a high vision score does not significantly affect the win rate. This may be because players are matched with opponents who have similar expertise in vision and map control, so having a higher vision score does not provide a significant advantage. The plot shows a steep drop in win rate for teams with vision scores below 20, indicating that having adequate vision is crucial for success in League of Legends. Overall, this plot highlights the importance of vision and map control in the game, and can help teams prioritize these aspects of their strategy.
ggplot(league_all_df_factored, aes(vision_score)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
The crowd control duration plot is a bit counterintuitive, as one might expect that teams with more crowd control duration on their opponents would have a higher win rate. However, the plot shows a peak in the win rate for teams with moderate amounts of crowd control duration, with the win rate decreasing for both higher and lower values. This may be because applying too much crowd control can limit a team’s ability to deal damage and secure objectives, while applying too little crowd control may not provide sufficient protection or disruption. The peak in the win rate at around 100 seconds of crowd control duration suggests that this may be an optimal amount for achieving success in League of Legends. Overall, this plot provides insight into the appropriate use of crowd control in the game, and can help teams plan their strategy accordingly.
ggplot(league_all_df_factored, aes(time_c_cing_others)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
Blue team may have a slight edge over red team due to asymmetrical map design.
ggplot(league_all_df_factored, aes(team)) +
geom_bar(aes(fill = win)) +
scale_fill_manual(values = c("blue", "red"))
Now after considering the correlation and individual, Let’s train the models with the final selection of needed predictors and add the interaction mentioned above
And It’s important to split the data since validation allows us to assess the performance of the model on unseen data. This is important because it allows us to evaluate the model’s ability to generalize to new data, rather than simply fitting to the training data. By using a validation set, we can tune the model’s hyperparameters and ensure that it is not overfitting to the training data.
Additionally, stratifying the validation set on the “win” variable is important because it ensures that the distribution of wins and losses in the validation set is representative of the overall distribution in the dataset. This is important because it ensures that the validation set is a fair and unbiased representation of the data, and allows us to accurately evaluate the model’s performance on both winning and losing games. Overall, using a validation set and stratifying on the “win” variable are important steps in building a reliable and effective machine learning model.
df_split <- league_df_eda %>% select(c(
kills,
deaths,
assists,
champ_level,
objectives_stolen,
objectives_stolen_assists,
baron_kills,
dragon_kills,
vision_score,
damage_dealt_to_buildings,
first_blood_assist,
first_blood_kill,
first_tower_assist,
first_tower_kill,
inhibitor_takedowns,
inhibitors_lost,
longest_time_spent_living,
neutral_minions_killed,
time_c_cing_others,
total_damage_dealt_to_champions,
total_damage_taken,
total_heal,
total_heals_on_teammates,
total_minions_killed,
total_time_spent_dead,
team,
win
)) %>%
initial_split(prop = 0.8, strata = win)
league_training <- training(df_split)
league_testing <- testing(df_split)
Now after we performed CV and split the data. It’s time to train
Create a recipe and add a few interaction terms
league_recipe <- league_training %>%
recipe(win ~ .,) %>%
step_dummy(all_factor_predictors()) %>%
step_interact(terms = ~ kills:vision_score) %>%
step_interact(terms = ~ dragon_kills:champ_level) %>%
step_interact(terms = ~ inhibitors_lost:deaths) %>%
step_normalize(all_numeric_predictors()) # Normalize (center and standardization)
league_recipe
## Recipe
##
## Inputs:
##
## role #variables
## outcome 1
## predictor 26
##
## Operations:
##
## Dummy variables from all_factor_predictors()
## Interactions with kills:vision_score
## Interactions with dragon_kills:champ_level
## Interactions with inhibitors_lost:deaths
## Centering and scaling for all_numeric_predictors()
Cross-validation is another critical step in tuning the parameters of a machine learning model. It involves dividing the training data into multiple folds, training the model on different combinations of folds, and evaluating its performance on each combination. This allows us to assess the model’s performance on different subsets of the data, and ensure that it is not overfitting to any particular subset. By using cross-validation, we can tune the model’s hyperparameters to optimize its performance on the training data.
In this case, we are using 5-fold cross-validation, which means that the training data is divided into 5 folds and the model is trained and evaluated on 5 different combinations of these folds. This allows us to assess the model’s performance on a wide range of data, while still reducing the computational cost compared to using a larger number of folds. Additionally, using 5 folds allows us to balance the trade-off between computational efficiency and the model’s ability to generalize to new data. Overall, cross-validation is an essential step in building a reliable and effective machine learning model.
league_folded <- league_training %>%
vfold_cv(v = 5, strata = win)
# set model
log_reg <- logistic_reg() %>%
set_engine("glm") %>%
set_mode("classification")
# setup workflow
log_workflow <- workflow() %>%
add_model(log_reg) %>%
add_recipe(league_recipe)
# fit the model
log_fit <- fit(log_workflow, league_training)
# generate roc_auc
log_roc_auc <- augment(log_fit, new_data = league_testing) %>%
roc_auc(truth = win, estimate = .pred_TRUE)
lda_reg <- discrim_linear() %>%
set_mode("classification") %>%
set_engine("MASS")
lda_workflow <- workflow() %>%
add_model(lda_reg) %>%
add_recipe(league_recipe)
lda_fit <- lda_workflow %>%
fit(league_training)
lda_roc_auc <- augment(lda_fit, new_data = league_testing) %>%
roc_auc(truth = win, estimate = .pred_TRUE)
qda_reg <- discrim_quad() %>%
set_mode("classification") %>%
set_engine("MASS")
qda_workflow <- workflow() %>%
add_model(qda_reg) %>%
add_recipe(league_recipe)
qda_fit <- qda_workflow %>%
fit(league_training)
qda_roc_auc <- augment(qda_fit, new_data = league_testing) %>%
roc_auc(truth = win, estimate = .pred_TRUE)
# Prepare to tune the parameters
elastic_spec <- multinom_reg(penalty = tune(), mixture = tune()) %>%
set_engine("glmnet") %>%
set_mode("classification")
elastic_workflow <- workflow() %>%
add_recipe(league_recipe) %>%
add_model(elastic_spec)
# Regularization
penalty_grid <- grid_regular(penalty(c(-5, 5)), mixture(c(0,1)), levels = 10)
elastic_res <- tune_grid(
elastic_workflow,
resamples = league_folded,
grid = penalty_grid,
)
# Save to a single r data to save time
saveRDS(elastic_res, "elastic_res.rds")
Regularization penalty should not be more than 1
# Load from already computed results
elastic_res <- readRDS("save/elastic_res.rds")
autoplot(elastic_res)
# Select the best
elastic_best <- select_best(elastic_res, metric = "roc_auc")
# Fit with the best params
elastic_final_fit <- finalize_workflow(elastic_workflow, elastic_best) %>%
fit(league_training)
elastic_roc_auc <- augment(elastic_final_fit, new_data = league_testing) %>%
roc_auc(truth = win, estimate = .pred_TRUE)
elastic_roc_auc
## # A tibble: 1 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 roc_auc binary 0.979
For the random forest, I used 5 as it is close to the square root of the number of factors. This is the recommended number for classification
rf_spec <- rand_forest(mtry = 5, trees = tune(), min_n = tune()) %>%
set_engine("ranger", importance = "impurity") %>%
set_mode("classification")
rf_workflow <- workflow() %>%
add_recipe(league_recipe) %>%
add_model(rf_spec)
rf_grid <- grid_regular(trees(), min_n(), levels = 5)
rf_res <- tune_grid(
rf_workflow,
resamples = league_folded,
grid = rf_grid,
control = control_grid(verbose = TRUE)
)
saveRDS(rf_res, "save/rf_res.rds")
rf_res <- readRDS("save/rf_res.rds")
rf_final_fit <- finalize_workflow(rf_workflow, select_best(rf_res, metric = "roc_auc")) %>%
fit(league_training)
rf_final_fit
saveRDS(rf_final_fit, "save/rf_final_fit.rds")
rf_final_fit <- readRDS("save/rf_final_fit.rds")
rf_roc_auc <- augment(rf_final_fit, new_data = league_testing) %>%
roc_auc(truth = win, estimate = .pred_TRUE)
boost_spec <- boost_tree(trees = tune(), tree_depth = tune()) %>%
set_engine("xgboost") %>%
set_mode("classification")
boost_grid <- grid_regular(trees(), tree_depth(c(2, 8)), levels = 5)
boost_workflow <- workflow() %>%
add_recipe(league_recipe) %>%
add_model(boost_spec)
boost_res <- tune_grid(
boost_workflow,
resamples = league_folded,
grid = boost_grid,
)
saveRDS(boost_res, "save/boost_res.rds")
The tree depth here doesn’t matter too much, but we do need 500 trees.
boost_res <- readRDS("save/boost_res.rds")
autoplot(boost_res)
select_best(boost_res, metric = "roc_auc")
## # A tibble: 1 × 3
## trees tree_depth .config
## <int> <int> <chr>
## 1 500 2 Preprocessor1_Model02
boost_final_fit <- finalize_workflow(boost_workflow, select_best(boost_res, metric = "roc_auc")) %>%
fit(league_training)
boost_roc_auc <- augment(boost_final_fit, new_data = league_testing) %>%
roc_auc(truth = win, estimate = .pred_TRUE)
svm_spec <- svm_poly(degree = 1, cost = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab", scaled = FALSE)
svm_workflow <- workflow() %>%
add_recipe(league_recipe) %>%
add_model(svm_spec)
svm_grid <- grid_regular(cost(c(-5, 5)), levels = 5)
svm_res <- tune_grid(
svm_workflow,
resamples = league_folded,
grid = svm_grid,
control = control_grid(verbose = TRUE)
)
saveRDS(svm_res, "save/svm_res.rds")
The roc_auc doesn’t change much after 0.125 cost
svm_res <- readRDS("save/svm_res.rds")
autoplot(svm_res)
svm_final_fit <- finalize_workflow(svm_workflow, select_best(svm_res, metric = "roc_auc")) %>%
fit(league_training)
saveRDS(svm_final_fit, "save/svm_final_fit.rds")
svm_final_fit <- readRDS("save/svm_final_fit.rds")
svm_roc_auc <- augment(svm_final_fit, new_data = league_testing) %>%
roc_auc(truth = win, estimate = .pred_TRUE)
Now all the model has been trained. It’s time to join all the ROC_AUC together and compare
model_names <- c("LogisticRegression", "LDA", "QDA", "ElasticNet", "RandomForest", "Boost", "SupportVectorMachine")
model_roc_aucs <- c(
log_roc_auc$.estimate,
lda_roc_auc$.estimate,
qda_roc_auc$.estimate,
elastic_roc_auc$.estimate,
rf_roc_auc$.estimate,
boost_roc_auc$.estimate,
svm_roc_auc$.estimate
)
# Combine the two lists into a data frame
all_roc_aucs <- bind_cols(model_name=model_names, roc_auc=model_roc_aucs)
boost_roc_auc
## # A tibble: 1 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 roc_auc binary 0.985
# Plot with reordered bars
all_roc_aucs %>%
ggplot(aes(x = reorder(model_name, roc_auc), y = roc_auc)) +
geom_col(width = 0.2) +
theme(text = element_text(size = 12)) +
xlab("Models") + ylab("ROC AUC") +
geom_text(aes(label = roc_auc), position = position_dodge(0.9), vjust = -0.25)
All the models except QDA have an ROC_AUC larger than 0.97 and the best is Boost It has 500 trees and 2 depths
For people who don’t know, yhe ROC AUC score is a measure of a model’s ability to distinguish between positive and negative examples, with a score of 1.0 indicating perfect accuracy and a score of 0.5 indicating random guessing. A score of 0.939 is well above the random guessing baseline, and indicates that the model is able to make highly accurate predictions.
augment(boost_final_fit, new_data = league_testing) %>%
roc_curve(truth = win, estimate = .pred_TRUE) %>%
autoplot()
And the accuracy and the confusion matrix on the new data
augment(boost_final_fit, new_data = league_testing) %>%
accuracy(truth = win, estimate = .pred_class)
## # A tibble: 1 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.938
augment(boost_final_fit, new_data = league_testing) %>%
conf_mat(truth = win, estimate = .pred_class) %>%
autoplot()
The results is pretty good, with nearly 0.94 accuracy.
There are several reasons why a boosting model might produce the best ROC AUC score compared to other models such as logistic regression, LDA, QDA, elastic net, SVM, and random forest.
One reason is that boosting algorithms are ensemble models, which means that they combine the predictions of multiple weak models to produce a more accurate and robust prediction. This can help to reduce overfitting and improve the model’s ability to generalize to new data.
Another reason is that boosting algorithms typically use decision trees as the weak models, which are powerful and flexible models that can capture complex nonlinear relationships in the data. This can allow the boosting model to make more accurate predictions, especially for datasets with complex and high-dimensional features.
The low ROC AUC score of QDA in the analysis is likely due to its assumption of linear separability and equal covariance, which are often violated in real-world data. This can limit the model’s ability to accurately predict the outcome of games, and explain why it performed worse than other models in the analysis.
The variable importance plots from the random forest fit provide valuable insights into the factors that contribute most to the win rate in League of Legends. The plots show that our first assumption - that kill is a major factor in determining the outcome of a game - is not entirely accurate, as it is ranked after over 10 other variables. Our second assumption - that first blood is a key predictor of success - is also incorrect, as it is one of the least contributing factors to the win rate, so even if the team get the first blood, they should still be cautious.
The plots also support our third assumption that vision score is an important factor in determining the outcome of a game. However, the interaction between kill and vision score appears to have a greater impact on the win rate than either variable individually. This suggests that players should prioritize both kill and vision score in order to maximize their chances of success.
Finally, our fourth assumption - that stolen objectives are important for winning games - is proven to be wrong by the plots, which show that stolen objectives have little correlation with the win rate. This may be because players who are forced to steal objectives are often lacking in gold and experience, which makes it difficult for them to fight effectively in team battles. Overall, the variable importance plots provide valuable insights into the factors that drive success in League of Legends, and can help players prioritize their actions in order to maximize their chances of winning.
rf_final_fit %>% pull_workflow_fit() %>% vip(num_features = 40)
## Warning: `pull_workflow_fit()` was deprecated in workflows 0.2.3.
## ℹ Please use `extract_fit_parsnip()` instead.
boost_final_fit %>% pull_workflow_fit() %>% vip(num_features = 40)
What if we had some fresh new data from games I just played. To grab new data of myself, I created a python scrapying script to do the job
import bs4
import requests
PUUID = "secret"
API_KEY = "secret"
def get_header():
return {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9,zh-Hant;q=0.8,zh-Hans;q=0.7,zh;q=0.6",
"Accept-Charset": "application/x-www-form-urlencoded; charset=UTF-8",
}
def get_match_ids(puuid=PUUID, count=20, start=0, api_key=API_KEY):
# set up the url
url = f"https://americas.api.riotgames.com/lol/match/v5/matches/by-puuid/{puuid}/ids?count={count}&start={start}&api_key={api_key}"
# make the request
response = requests.get(url, headers=get_header())
# get the json data
return response.json()
def get_match_data(match_id, api_key=API_KEY):
url = f"https://americas.api.riotgames.com/lol/match/v5/matches/{match_id}?api_key={api_key}"
response = requests.get(url, headers=get_header())
return response.json()
# Get the first 2 matches
match_ids = get_match_ids(PUUID, 20, 0, API_KEY)
match_data_1 = get_match_data(match_ids[0], API_KEY)
match_data_2 = get_match_data(match_ids[1], API_KEY)
class Participant:
assists: int
baronKills: int
bountyLevel: int
champExperience: int
champLevel: int
championId: int
championName: str
championTransform: int
consumablesPurchased: int
damageDealtToBuildings: int
damageDealtToObjectives: int
damageDealtToTurrets: int
damageSelfMitigated: int
deaths: int
detectorWardsPlaced: int
doubleKills: int
dragonKills: int
firstBloodAssist: bool
firstBloodKill: bool
firstTowerAssist: bool
firstTowerKill: bool
gameEndedInEarlySurrender: bool
gameEndedInSurrender: bool
goldEarned: int
goldSpent: int
individualPosition: str
inhibitorKills: int
inhibitorTakedowns: int
inhibitorsLost: int
item0: int
item1: int
item2: int
item3: int
item4: int
item5: int
item6: int
itemsPurchased: int
killingSprees: int
kills: int
lane: str
largestCriticalStrike: int
largestKillingSpree: int
largestMultiKill: int
longestTimeSpentLiving: int
magicDamageDealt: int
magicDamageDealtToChampions: int
magicDamageTaken: int
neutralMinionsKilled: int
nexusKills: int
nexusTakedowns: int
objectivesStolen: int
objectivesStolenAssists: int
participantId: int
pentaKills: int
perks: dict
physicalDamageDealt: int
physicalDamageDealtToChampions: int
physicalDamageTaken: int
profileIcon: int
puuid: str
quadraKills: int
riotIdName: str
riotIdTagline: str
role: str
sightWardsBoughtInGame: int
spell1Casts: int
spell2Casts: int
spell3Casts: int
spell4Casts: int
summoner1Casts: int
summoner1Id: int
summoner2Casts: int
summoner2Id: int
summonerId: str
summonerLevel: int
summonerName: str
teamEarlySurrendered: bool
teamId: int
teamPosition: str
timeCCingOthers: int
timePlayed: int
totalDamageDealt: int
totalDamageDealtToChampions: int
totalDamageShieldedOnTeammates: int
totalDamageTaken: int
totalHeal: int
totalHealsOnTeammates: int
totalMinionsKilled: int
totalTimeCCDealt: int
totalTimeSpentDead: int
totalUnitsHealed: int
tripleKills: int
trueDamageDealt: int
trueDamageDealtToChampions: int
trueDamageTaken: int
turretKills: int
turretTakedowns: int
turretsLost: int
unrealKills: int
visionScore: int
visionWardsBoughtInGame: int
wardsKilled: int
wardsPlaced: int
win: bool
def __init__(self, participant_data):
self.assists = participant_data["assists"]
self.baronKills = participant_data["baronKills"]
self.bountyLevel = participant_data["bountyLevel"]
self.champExperience = participant_data["champExperience"]
self.champLevel = participant_data["champLevel"]
self.championId = participant_data["championId"]
self.championName = participant_data["championName"]
self.championTransform = participant_data["championTransform"]
self.consumablesPurchased = participant_data["consumablesPurchased"]
self.damageDealtToBuildings = participant_data["damageDealtToBuildings"]
self.damageDealtToObjectives = participant_data["damageDealtToObjectives"]
self.damageDealtToTurrets = participant_data["damageDealtToTurrets"]
self.damageSelfMitigated = participant_data["damageSelfMitigated"]
self.deaths = participant_data["deaths"]
self.detectorWardsPlaced = participant_data["detectorWardsPlaced"]
self.doubleKills = participant_data["doubleKills"]
self.dragonKills = participant_data["dragonKills"]
self.firstBloodAssist = participant_data["firstBloodAssist"]
self.firstBloodKill = participant_data["firstBloodKill"]
self.firstTowerAssist = participant_data["firstTowerAssist"]
self.firstTowerKill = participant_data["firstTowerKill"]
self.gameEndedInEarlySurrender = participant_data["gameEndedInEarlySurrender"]
self.gameEndedInSurrender = participant_data["gameEndedInSurrender"]
self.goldEarned = participant_data["goldEarned"]
self.goldSpent = participant_data["goldSpent"]
self.individualPosition = participant_data["individualPosition"]
self.inhibitorKills = participant_data["inhibitorKills"]
self.inhibitorTakedowns = participant_data["inhibitorTakedowns"]
self.inhibitorsLost = participant_data["inhibitorsLost"]
self.item0 = participant_data["item0"]
self.item1 = participant_data["item1"]
self.item2 = participant_data["item2"]
self.item3 = participant_data["item3"]
self.item4 = participant_data["item4"]
self.item5 = participant_data["item5"]
self.item6 = participant_data["item6"]
self.itemsPurchased = participant_data["itemsPurchased"]
self.killingSprees = participant_data["killingSprees"]
self.kills = participant_data["kills"]
self.lane = participant_data["lane"]
self.largestCriticalStrike = participant_data["largestCriticalStrike"]
self.largestKillingSpree = participant_data["largestKillingSpree"]
self.largestMultiKill = participant_data["largestMultiKill"]
self.longestTimeSpentLiving = participant_data["longestTimeSpentLiving"]
self.magicDamageDealt = participant_data["magicDamageDealt"]
self.magicDamageDealtToChampions = participant_data["magicDamageDealtToChampions"]
self.magicDamageTaken = participant_data["magicDamageTaken"]
self.neutralMinionsKilled = participant_data["neutralMinionsKilled"]
self.nexusKills = participant_data["nexusKills"]
self.nexusTakedowns = participant_data["nexusTakedowns"]
self.objectivesStolen = participant_data["objectivesStolen"]
self.objectivesStolenAssists = participant_data["objectivesStolenAssists"]
self.participantId = participant_data["participantId"]
self.pentaKills = participant_data["pentaKills"]
self.perks = participant_data["perks"]
self.physicalDamageDealt = participant_data["physicalDamageDealt"]
self.physicalDamageDealtToChampions = participant_data["physicalDamageDealtToChampions"]
self.physicalDamageTaken = participant_data["physicalDamageTaken"]
self.profileIcon = participant_data["profileIcon"]
self.puuid = participant_data["puuid"]
self.quadraKills = participant_data["quadraKills"]
self.riotIdName = participant_data["riotIdName"]
self.riotIdTagline = participant_data["riotIdTagline"]
self.role = participant_data["role"]
self.sightWardsBoughtInGame = participant_data["sightWardsBoughtInGame"]
self.spell1Casts = participant_data["spell1Casts"]
self.spell2Casts = participant_data["spell2Casts"]
self.spell3Casts = participant_data["spell3Casts"]
self.spell4Casts = participant_data["spell4Casts"]
self.summoner1Casts = participant_data["summoner1Casts"]
self.summoner1Id = participant_data["summoner1Id"]
self.summoner2Casts = participant_data["summoner2Casts"]
self.summoner2Id = participant_data["summoner2Id"]
self.summonerId = participant_data["summonerId"]
self.summonerLevel = participant_data["summonerLevel"]
self.summonerName = participant_data["summonerName"]
self.teamEarlySurrendered = participant_data["teamEarlySurrendered"]
self.teamId = participant_data["teamId"]
self.teamPosition = participant_data["teamPosition"]
self.timeCCingOthers = participant_data["timeCCingOthers"]
self.timePlayed = participant_data["timePlayed"]
self.totalDamageDealt = participant_data["totalDamageDealt"]
self.totalDamageDealtToChampions = participant_data["totalDamageDealtToChampions"]
self.totalDamageShieldedOnTeammates = participant_data["totalDamageShieldedOnTeammates"]
self.totalDamageTaken = participant_data["totalDamageTaken"]
self.totalHeal = participant_data["totalHeal"]
self.totalHealsOnTeammates = participant_data["totalHealsOnTeammates"]
self.totalMinionsKilled = participant_data["totalMinionsKilled"]
self.totalTimeCCDealt = participant_data["totalTimeCCDealt"]
self.totalTimeSpentDead = participant_data["totalTimeSpentDead"]
self.totalUnitsHealed = participant_data["totalUnitsHealed"]
self.tripleKills = participant_data["tripleKills"]
self.trueDamageDealt = participant_data["trueDamageDealt"]
self.trueDamageDealtToChampions = participant_data["trueDamageDealtToChampions"]
self.trueDamageTaken = participant_data["trueDamageTaken"]
self.turretKills = participant_data["turretKills"]
self.turretTakedowns = participant_data["turretTakedowns"]
self.turretsLost = participant_data["turretsLost"]
self.unrealKills = participant_data["unrealKills"]
self.visionScore = participant_data["visionScore"]
self.visionWardsBoughtInGame = participant_data["visionWardsBoughtInGame"]
self.wardsKilled = participant_data["wardsKilled"]
self.wardsPlaced = participant_data["wardsPlaced"]
self.win = participant_data["win"]
class MatchData:
game_type: str
game_duration: int
gameMode: str
mapId: int
participant: list()
teams: list
def __init__(self, match_data):
self.game_type = match_data["info"]["gameType"]
self.game_duration = match_data["info"]["gameDuration"]
self.gameMode = match_data["info"]["gameMode"]
self.mapId = match_data["info"]["mapId"]
self.teams = match_data["info"]["teams"]
self.participant = []
for participant in match_data["info"]["participants"]:
self.participant.append(Participant(participant))
def get_participant_stats(game_data, puuid=PUUID):
for participant in game_data.participant:
if participant.puuid == puuid:
return participant
def print_stats(stat: Participant):
print({stat.kills},
{stat.deaths},
{stat.assists},
{stat.champLevel},
{stat.objectivesStolen},
{stat.objectivesStolenAssists},
{stat.baronKills},
{stat.dragonKills},
{stat.visionScore},
{stat.damageDealtToBuildings},
{stat.firstBloodAssist},
{stat.firstBloodKill},
{stat.firstTowerAssist},
{stat.firstTowerKill},
{stat.inhibitorTakedowns},
{stat.inhibitorsLost},
{stat.longestTimeSpentLiving},
{stat.neutralMinionsKilled},
{stat.timeCCingOthers},
{stat.totalDamageDealtToChampions},
{stat.totalDamageTaken},
{stat.totalHeal},
{stat.totalHealsOnTeammates},
{stat.totalMinionsKilled},
{stat.totalTimeSpentDead},
{"blue" if stat.teamId == 100 else "red"},
{stat.win})
game_1_stat = get_participant_stats(MatchData(match_data_1))
game_2_stat = get_participant_stats(MatchData(match_data_2))
print_stats(game_1_stat)
print_stats(game_2_stat)
The above code generates the following strings after replace a few words. And Yes! The model gives the correct prediction!
obs1 <- paste("8,7,4,16,0,0,0,0,6,2189,False,False,False,False,0,3,495,1,2,27427,28297,458,0,174,227,blue,False")
obs2 <- paste("4,3,4,12,0,0,0,0,9,1750,False,False,False,False,1,0,784,1,3,11734,8750,392,0,104,62,blue,True")
# Get the dataframe from text
obs1 <- read.table(text = obs1, sep = ",", col.names = colnames(league_training))
obs2 <- read.table(text = obs2, sep = ",", col.names = colnames(league_training))
# Factorization
new_testing <- bind_rows(obs1, obs2)
new_testing$win <- factor(as.logical(new_testing$win))
new_testing$first_tower_kill <- factor(as.logical(new_testing$first_tower_kill))
new_testing$first_tower_assist <- factor(as.logical(new_testing$first_tower_assist))
new_testing$first_blood_kill <- factor(as.logical(new_testing$first_blood_kill))
new_testing$first_blood_assist <- factor(as.logical(new_testing$first_blood_assist))
new_testing$team <- factor(new_testing$team)
# Predict the new data
augment(rf_final_fit, new_data = new_testing) %>%
select(win, .pred_class)
## # A tibble: 2 × 2
## win .pred_class
## <fct> <fct>
## 1 FALSE FALSE
## 2 TRUE TRUE
Even though the above models offer excellent prediction accuracy, they are based on end-game stats and may not be useful for players who want to know their chances of winning before the game starts. In these cases, players only have information about the champions that will be in the game, but not about how well they and their opponents will perform. Using only this information, it is difficult to accurately predict the outcome of a game.
In order to test the feasibility of using only champion information to predict the win rate, I built a DNN model. However, the results were not as good as I had hoped, with an validation accuracy of only 51.62% - only slightly better than a random generator. This is likely due to the fact that the game data was all recorded on the same day, so the effects of regular champion balancing tend to even out the win rates. Overall, it seems that using only champion information is not a reliable way to predict the outcome of a game.
The extra analysis can be found in the same folder
analysis-extra.html it is written in python
The analysis suggests that the best model to predict the win rate of a League of Legends game based on end-game stats is a boosting model. This is likely due to the fact that boosting algorithms are ensemble models that combine the predictions of multiple weak models, and are able to capture complex nonlinear relationships in the data.
It is worth noting that the analysis only used a subset of the available data, and the performance of the model may improve if more of the dataset is used. However, even with the limited data, the model achieved an impressive AUC ROC score of over 0.98, indicating that it is highly accurate at predicting the outcome of games.
Still, there are several ways that the analysis and modeling process could be further improved. For example, the analysis could be extended to include more predictors, such as player skill level, game mode, and the specific champions used by each player. This could provide additional insights into the factors that influence the win rate, and allow the model to make more accurate predictions.
Another potential improvement would be to use more advanced techniques for feature engineering and selection. For example, the analysis could incorporate techniques such as dimensionality reduction, feature selection algorithms, and interaction terms, to identify the most important features and improve the model’s predictive power.
Additionally, the analysis could be extended to include more advanced models, such as deep learning neural networks or gradient boosting machines, which are known to perform well on complex and high-dimensional datasets. This could further improve the model’s performance and enable it to make even more accurate predictions.
Overall, there are many potential ways to improve the analysis and modeling process, and further exploration and experimentation could provide valuable insights into the factors that influence the win rate in League of Legends.